(19) 



Europaisches Patentamt 
European Patent Office 
Office europeen des brevets 




(12) 



(43) Date of publication: 

15.09.1999 Bulletin 1999/37 

(21) Application number: 99301674.0 

(22) Date of filing: 05.03.1999 



(11) EP 0 942 409 A2 

EUROPEAN PATENT APPLICATION 

(51) intci* G10L5/04 



(84) Designated Contracting States: 


(72) Inventor: Yamada, Masayuki 


AT BE CH CY DE DK ES Fl FR GB GR IE IT LI LU 


Tokyo (JP) 


MC NL PT SE 


Designated Extension States: 


(74) Representative: 


AL LT LV MK RO SI 


Beresford, Keith Denis Lewis et al 




BERESFORD & Co. 


(30) Priority: 09.03.1998 JP 05724998 


High Holborn 




2-5 Warwick Court 


(71) Applicant: CANON KABUSHIKI KAISHA 


London WC1R 5DJ (GB) 


Tokyo (JP) 



(54) Phonem based speech synthesis 

(57) A second phoneme is generated in considera- 
tion of a phonemic context with respect to a first pho- 
neme as a search target. Phonemic piece data corre- 
sponding to the second phoneme is searched out from 
a database. A third phoneme is generated by changing 



the phonemic context on the basis of the search result, 
and phonemic piece data corresponding to the third 
phoneme is re-searched out from the database. The 
search or re-search result is registered in a table in cor- 
respondence with the second or third phoneme. 
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Description 

BACKGROUND OF THE INVENTION 

[0001] The present invention relates to a speech syn- s 
thesis apparatus which has a database for managing 
phonemic piece data and performs speech synthesis by 
using the phonemic piece data managed by the data- 
base, a control method for the apparatus, and a compu- 
ter-readable memory. 10 
[0002] As a conventional speech synthesis method, 
a synthesis method based on a waveform concatena- 
tion scheme is available. In the waveform concatenation 
synthesis method, the prosody is changed by the pitch 
synchronous waveform overlap adding method of past- 15 
ing waveform element pieces corresponding one to sev- 
eral pitches at desired pitch intervals. The waveform 
concatenation synthesis method can obtain more natu- 
ral synthetic speech than a synthesis method based on 
a parametric scheme, but suffers the problem of a nar- 20 
row allowable range with respect to changes in prosody. 
[0003] Under the circumstances, attempts are made 
to improve the speech quality by preparing various 
speech data and properly selecting and using them. As 
a criterion for selection of speech data, information such 25 
as a phonemic context (a phoneme to be synthesized 
or a few phonemes on two sides of the target phoneme) 
or a fundamental frequency F0 is used. 
[0004] The following problems are, however, posed in 
the above conventional speech synthesis method. 30 
[0005] If, for example, there is no data that satisfies a 
phonemic context as a synthesis target, a search for 
necessary speech data is made again by relaxing the 
condition associated with the phonemic context. The ex- 
ecution of this re-search in speech synthesis compli- 35 
cates the processing, resulting in an increase in 
processing time. In addition, when the fundamental fre- 
quency F0 is to be used as a criterion for selection of 
speech data, each speech data must be evaluated in 
association with the fundamental frequency F0 to obtain 40 
speech data that matches most with the fundamental 
frequency F0 of the speech data to be synthesized. 

SUMMARY OF THE INVENTION 

45 

[0006] The present invention has been made in con- 
sideration of the above problems, and has as its object 
to provide a speech synthesis apparatus capable of per- 
forming speech synthesis with high precision at high 
speed, a control method therefor, and a computer-read- 50 
able memory. 

[0007] I n order to achieve the above object, a speech 
synthesis apparatus according to the present invention 
has the following arrangement. 

[0008] There is provided a speech synthesis appara- 55 
tus having a database for managing phonemic piece da- 
ta, comprising: 



generating means for generating a second pho- 
neme in consideration of a phonemic context for a 
first phoneme as a search target; 
search means for searching the database for a pho- 
nemic piece data corresponding to the second pho- 
neme; 

re-search means for generating a third phoneme by 
changing the phonemic context on the basis of the 
search result obtained by the search means, and 
re-searching the database for phonemic piece data 
corresponding to the third phoneme; and 
registration means for registering the search result 
obtained by the search means or the re-search 
means in a table in correspondence with the second 
or third phoneme. 

[0009] In order to achieve the above object, a speech 
synthesis apparatus according the present invention 
has the following arrangement. 
[001 0] There is provided a speech synthesis appara- 
tus for performing speech synthesis by using phonemic 
piece data managed by a database, comprising: 

storage means for storing a table for managing po- 
sition information indicating a position of phonemic 
piece data in the database in correspondence with 
a phoneme obtained in consideration of a phonemic 
context made to correspond to the phonemic piece 
data; 

calculation means for acquiring each phonemic 
context information of a phoneme group as a syn- 
thesis target and fundamental frequencies corre- 
sponding thereto and calculating an average of ac- 
quired fundamental frequencies; 
search means for searching a phoneme group cor- 
responding to the phonemic context information 
from the table; 

acquisition means for acquiring, from the table, po- 
sition information of phonemic piece data corre- 
sponding to a predetermined phoneme of the pho- 
neme group searched out by the search means, on 
the basis of the average of fundamental frequencies 
calculated by the calculation means; and 
changing means for acquiring phonemic piece data 
indicated by the position information acquired by 
the acquisition means from the database, and 
changing a prosody of the acquired phonemic piece 
data. 

[001 1 ] In order to achieve the above object, a control 
method for a speech synthesis apparatus according to 
the present invention has the following steps. 
[0012] There is provided a control method for a 
speech synthesis apparatus having a database for man- 
aging phonemic piece data, comprising: 

the generating step of generating a second pho- 
neme in consideration of a phonemic context for a 
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first phoneme as a search target; 
the search step of searching the database for a pho- 
nemic piece data corresponding to the second pho- 
neme; 

the re-search step of generating a third phoneme 
by changing the phonemic context on the basis of 
the search result obtained in the search step, and 
re-searching the database for phonemic piece data 
corresponding to the third phoneme; and 
the registration step of registering the search result 
obtained in the search step or the re-search step in 
a table in correspondence with the second or third 
phoneme. 

[001 3] I n order to achieve the above object, a control 
method for a speech synthesis apparatus according to 
the present invention has the following steps. 
[0014] There is provided a control method for a 
speech synthesis apparatus for performing speech syn- 
thesis by using phonemic piece data managed by a da- 
tabase, comprising: 

the storage step of storing a table for managing po- 
sition information indicating a position of phonemic 
piece data in the database in correspondence with 
a phoneme obtained in consideration of a phonemic 
context made to correspond to the phonemic piece 
data; 

the calculation step of acquiring each phonemic 
context information of a phoneme group as a syn- 
thesis target and fundamental frequencies corre- 
sponding thereto and calculating an average of ac- 
quired fundamental frequencies; 
the search step of searching a phoneme group cor- 
responding to the phonemic context information 
from the table; 

the acquisition step of acquiring, from the table, po- 
sition information of phonemic piece data corre- 
sponding to a predetermined phoneme of the pho- 
neme group searched out in the search step, on the 
basis of the average of fundamental frequencies 
calculated in the calculation step; and 
the changing step of acquiring phonemic piece data 
indicated by the position information acquired in the 
acquisition step from the database, and changing a 
prosody of the acquired phonemic piece data. 

[001 5] I n order to achieve the above object, a compu- 
ter-readable memory according to the present invention 
has the following program codes. 
[0016] There is provided a computer-readable mem- 
ory storing program codes for controlling a speech syn- 
thesis apparatus having a database for managing pho- 
nemic piece data, comprising: 

a program code for the generating step of generat- 
ing a second phoneme in consideration of a phone- 
mic context for a first phoneme as a search target; 



10 



a program code for the search step of searching the 
database for a phonemic piece data corresponding 
to the second phoneme; 

a program code for the re-search step of generating 
a third phoneme by changing the phonemic context 
on the basis of the search result obtained in the 
search step, and re-searching the database for pho- 
nemic piece data corresponding to the third pho- 
neme; and 

a program code for the registration step of register- 
ing the search result obtained in the search step or 
the re-search step in a table in correspondence with 
the second or third phoneme. 



[001 7] In order to achieve the above object, a compu- 
ter-readable memory according to the present invention 
has the following program codes. 
[0018] There is provided a computer-readable mem- 
ory storing program codes for controlling a speech syn- 
20 thesis apparatus for performing speech synthesis by us- 
ing phonemic piece data managed by a database, com- 
prising: 
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a program code for the storage step of storing a ta- 
ble for managing position information indicating a 
position of phonemic piece data in the database in 
correspondence with a phoneme obtained in con- 
sideration of a phonemic context made to corre- 
spond to the phonemic piece data; 
a program code for the calculation step of acquiring 
each phonemic context information of a phoneme 
group as a synthesis target and fundamental fre- 
quencies corresponding thereto and calculating an 
average of acquired fundamental frequencies; 
a program code for the search step of searching a 
phoneme group corresponding to the phonemic 
context information from the table; 
a program code for the acquisition step of acquiring, 
from the table, position information of phonemic 
piece data corresponding to a predetermined pho- 
neme of the phoneme group searched out in the 
search step, on the basis of the average of funda- 
mental frequencies calculated in the calculation 
step; and 

a program code for the changing step of acquiring 
phonemic piece data indicated by the position infor- 
mation acquired in the acquisition step from the da- 
tabase, and changing a prosody of the acquired 
phonemic piece data. 



[0019] According to the present invention described 
above, a speech synthesis apparatus capable of per- 
forming speech synthesis with high precision at high 
speed, a control method therefor, and a computer-read- 
55 able memory can be provided. 

[0020] Other features and advantages of the present 
invention will be apparent from the fol towing description 
taken in conjunction with the accompanying drawings, 
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in which like reference characters designate the same 
or similar parts throughout the figures thereof. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0021] 

Fig. 1 is a block diagram showing the arrangement 
of a speech synthesis apparatus according to the 
first embodiment of the present invention; 
Fig. 2 is a flow chart showing search processing ex- 
ecuted in the first embodiment of the present inven- 
tion; 

Fig. 3 is a view showing an index managed in the 
first embodiment of the present invention; 
Fig. 4 is a flow chart showing speech synthesis 
processing executed in the first embodiment of the 
present invention; 

Fig. 5 is a view showing a table obtained from the 
index managed in the first embodiment of the 
present invention; 

Fig. 6 is a flowchart showing search processing ex- 
ecuted in the second embodiment of the present in- 
vention; 

Fig. 7 is a view showing an index managed in the 
second embodiment of the present invention: 
Fig. 8 is a flow chart showing search processing ex- 
ecuted in the third embodiment of the present in- 
vention; and 

Fig. 9 is a view showing an index managed in the 
third embodiment of the present invention. 

DETAILED DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

[0022] Fig. 1 is a block diagram showing the arrange- 
ment of a speech synthesis apparatus according to the 
first embodiment of the present invention. 
[0023] Reference numeral 103 denotes a CPU for 
performing numerical operation/control, control on the 
respective components of the apparatus, and the like, 
which are executed in the present invention; 1 02, a RAM 
serving as a work area for processing executed in the 
present invention and a temporary saving area for var- 
ious data; 101 , a ROM storing various control programs 
such as programs executed in the present invention, 
and having an area for storing a database 1 01 a for man- 
aging phonemic piece data used for speech synthesis; 
1 09, an external storage unit serving as an area for stor- 
ing processed data; and 105, a D/A converter for con- 
verting the digital speech data synthesized by the 
speech synthesis apparatus into analog speech data 
and outputting it from a loudspeaker 110. 
[0024] Reference numeral 1 06 denotes a display con- 
trol unit for controlling a display 111 when the processing 
state and processing results of the speech synthesis ap- 
paratus, and a user interface are to be displayed; 107, 
an input control unit for recognizing key information in- 



put from a keyboard 112 and executing the designated 
processing; 108, a communication control unit for con- 
trolling transmission/reception of data through a com- 
munication network 113; and 104, a bus for connecting 
5 the respective components of the speech synthesis ap- 
paratus to each other. 

[0025] Search processing of searching for a target 
phoneme, of the processing executed in the first embod- 
iment, will be described next with reference to Fig. 2. 
w [0026] Fig. 2 is a flow chart showing search process- 
ing executed in the first embodiment of the present in- 
vention. 

[0027] In the first embodiment, as phonemic contexts, 
two phonemes on both sides of each phoneme, i.e., 
15 phonemes as right and left phonemic contexts called a 
triphone, are used. 

[0028] First of all, in step S1 , a phoneme p as a search 
target from the database 1 01 a is initialized to a triphone 
ptr. In step S2, a search is made for the phoneme p from 

20 the database 101 a. More specifically, a search is made 
for phonemic piece data having label p indicating the 
phoneme p. It is then checked in step S4 whether there 
is the phoneme p in the database 101a. If it is deter- 
mined that the phoneme p is not present (NO in step 

25 S4), the flow advances to step S3 to change the search 
target to a substitute phoneme having lower phonemic 
context dependency than the phoneme p. If the pho- 
neme p matching with the triphone ptr is not present in 
the database 101a, the phoneme p is changed to the 

30 right phonemic context dependent phoneme. If the right 
phonemic context dependent phoneme does not match 
with the triphone ptr, the phoneme p is changed to the 
left phonemic context dependent phoneme. If the left 
phonemic context dependent phoneme does not match 

35 with the triphone ptr, the phoneme p is changed to an- 
other phoneme independently of a phonemic context. 
Alternatively, a high priority may be given to a left pho- 
nemic context phoneme for a vowel, and a high priority 
may be given to a right phonemic context phoneme for 

*o a consonant. In addition, if there is no phoneme p that 
matches with the triphone ptr, one or both of left and 
right phonemic contexts may be replaced with similar 
phonemic contexts. For example, the "k" (consonant of 
the "ka° column in the Japanese syllabary) may be used 

45 as a substitute when the right phonemic context is "p" 
(consonant for the "pa" column which is modified B ha D 
column in the Japanese syllabary). Note, the Japanese 
syllabary is the Japanese basic phonetic character set. 
The character set can be arranged in a matrix where 

50 there are five (5) rows and ten (10) columns. The five 
rows are respectively the five vowels of the English lan- 
guage and the ten rows consist of 9 consonants and the 
column of the five vowels. Aphonetic (sound) character 
is represented by the sound resulting from combining a 

55 column character and a row character, e.g. column V 
and row "e B is pronounced le"; column "s" and row "o" 
is pronounced "so". After the phoneme p as the search 
condition is changed in this manner, the flow returns to 
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step S2. 

[0029] If it is determined that the phoneme p is present 
(YES in step S4), the flow advances to step S5 to cal- 
culate a mean FO (the mean of the fundamental frequen- 
cies from the start of phonemic piece data to the end). 
Note that this calculation may be performed with respect 
to a logarithm FO (function of time) or linear FO Further- 
more, the mean FO of unvoiced speech may be set to 0 
or estimated from the mean FO of phonemic piece data 
of phonemes on both sides of the phoneme p by some 
method. 

[0030] In step S6, the respective searched phonemic 
piece data are aligned (sorted) on the basis of the cal- 
culated mean F0. In step S7, the sorted phonemic piece 
data are registered in correspondence with the triphone 
ptr. As a result of registration, an index like the one 
shown in Fig. 3 is obtained, which indicates the corre- 
spondence between generated phonemic piece data 
and triphones. As shown in Fig. 3, in the pointers man- 
aged in correspondence with the triphones, "phonemic 
piece position" indicating the location of each phonemic 
piece data in the database 101a and "mean F0" are 
managed in the form of a table. 
[0031] Steps S1 to S7 are repeated for all conceivable 
triphones. It is then checked in step S8 whether the 
processing for all the triphones is complete. If it is de- 
termined that the processing is not complete (NO in step 
S8), the flow returns to step S1. If it is determined that 
the processing is complete (YES in step S8), the 
processing is terminated. 

[0032] Speech synthesis processing of performing 
speech synthesis by searching for phonemic piece data 
of a phoneme as a synthesis target using the index gen- 
erated by the processing described with reference to 
Fig. 2 will be described next with reference to Fig. 4. 
[0033] Fig. 4 is a flow chart showing the speech syn- 
thesis processing executed in the first embodiment of 
the present invention. 

[0034] When speech synthesis processing is to be 
performed, the triphone context ptr of the phoneme p as 
a synthesis target and F0 trajectory are given. Speech 
synthesis is then performed by searching phonemic 
piece data of phonemes on the basis of mean F0 and 
triphone context ptr and using the waveform overlap 
adding method. 

[0035] First of all, in step S9, mean F0' which is mean 
of the given F0 trajectory of a synthesis target is calcu- 
lated. In step S10, a table for managing the phonemic 
piece position of phonemic piece data corresponding to 
the triphone ptr of the phoneme p is searched out from 
the index shown in Fig. 3. If, for example, the triphone 
ptr is "a. A. b", the table shown in Fig. 5 is obtained from 
the index shown in Fig. 3. Since proper substitute pho- 
nemes have been obtained by the above search 
processing, the result of this step never becomes empty 
[0036] In step S11, the phonemic piece position of 
phonemic piece data having the mean F0 nearest to the 
mean FO 1 is obtained on the basis of the table obtained 



in step S1 0. In this case, since the phonemic piece data 
have been sorted by the above search processing on 
the basis of mean F0, a search can be made by using 
a binary search method or the like. In step S12, phone- 

5 mic piece data is retrieved from the database 101a in 
accordance with the phonemic piece position obtained 
in step S11. In step S13, the prosody of the phonemic 
piece data obtained in step S1 2 is changed by using the 
waveform adding method. 

w [0037] As described above, according to the first em- 
bodiment, when the absence of phonemic piece data is 
determined after the presence/absence of phonemic 
piece data is checked with respect to all the conceivable 
phonemic contexts, the processing is simplified and the 

75 processing speed is increased by preparing substitute 
phonemes in advance. In addition, since information as- 
sociated with the mean F0 of phonemic piece data 
present in each phonemic context is extracted in ad- 
vance, and the phonemic piece data are managed on 

20 the basis of the extracted information. This can increase 
the processing speed of speech synthesis processing. 

[Second Embodiment] 

25 [0038] Quantization of the mean F0 of phonemic 
piece data may replace calculation of the mean F0 of 
continuous phonemic piece data in step S5 in Fig. 2 in 
the first embodiment. This processing will be described 
with reference to Fig. 6. 

30 [0039] Fig. 6 is a flow chart showing search process- 
ing executed in the second embodiment of the present 
invention. 

[0040] Note that the same step numbers in Fig. 6 de- 
note the same processes as those in Fig. 2 in the first 
35 embodiment, and a detailed description thereof will be 
omitted. 

[0041] In step S14, a mean F0 of the phonemic piece 
data of searched phonemes p is quantized to obtain the 
quantized mean F0 (obtained by quantizing the mean 
40 F0 as a continuous value at certain intervals) . This cal- 
culation maybe performed for the logarithm F0 or linear 
F0. In addition, the mean F0 of unvoiced speech may 
be set to 0, or unvoiced speech may be estimated from 
the mean F0 of phonemic piece data on both side of the 

6 unvoiced speech by some method. 

[0042] In step S6a, the searched phonemic piece data 
are aligned (sorted) on the basis of the quantized mean 
F0. In step S7a, the sorted phonemic piece data are reg- 
istered in correspondence with triphones ptr. As a result 

so of registration, an index indicating the correspondence 
between the generated phonemic piece data and the tri- 
phones is formed as shown in Fig. 1. In addition, as 
shown in Fig. 7, in the pointers managed in correspond- 
ence with the triphones, "phonemic piece position" indi- 

55 eating the location of each phonemic piece data in the 
database 101a and "mean F0' are managed in the form 
of a table. 

[0043] Steps S1 to S7a are repeated for all possible 
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triphones. It is then checked in step S8a whether the 
processing for all the triphones is complete. If it is de- 
termined that the processing is not complete (NO in step 
S8a), the flow returns to step S1 . If it is determined that 
the processing is complete (YES in step S8a), the 
processing is terminated. 

[0044] As described above, according to the second 
embodiment, in addition to the effects obtained in the 
first embodiment, the number of phonemic pieces and 
the calculation amount for search processing can be re- 
duced by using the quantized mean F0 of phonemic 
piece data. 

[Third Embodiment] 

[0045] In the second embodiment, after the portions 
between the sorted phonemic piece data are interpolat- 
ed, the respective phonemic piece data may be regis- 
tered in correspondence with the triphones ptr. That is, 
an arrangement may be made such that phonemic piece 
positions corresponding to the quantized means F0 of 
all the quantized phonemic piece data can be searched 
out in the tables in the index. This processing will be 
described with reference to Fig. 8. 
[0046] Fig. 8 is a flow chart showing search process- 
ing executed in the third embodiment of the present in- 
vention. 

[0047] Note that the same step numbers in Fig. 8 de- 
note the same processes as those in Fig. 6 in the second 
embodiment, and a detailed description thereof will be 
omitted. 

[0048] In step S15, the portions between sorted pho- 
nemic piece data are interpolated. In step S7b, the in- 
terpolated phonemic piece data are registered in corre- 
spondence with triphones ptr. As a result of registration, 
an index indicating the correspondence between the 
generated phonemic piece data and the triphones is 
formed as shown in Fig. 9. In addition, as shown in Fig. 
9, in the pointers managed in correspondence with the 
triphones, "phonemic piece position" indicating the lo- 
cation of each phonemic piece data in the database 
101a and "mean FO" are managed in the form of a table. 
[0049] Steps S1 to S7b are repeated for all possible 
triphones. It is then checked in step S8b whether the 
processing for all the triphones is complete. If it is de- 
termined that the processing is not complete (NO in step 
S8b), the flow returns to step S1 . If it is determined that 
the processing is complete (YES in step S8b), the 
processing is terminated. 

[0050] As described above, according to the third em- 
bodiment, in addition to the effects obtained in the sec- 
ond embodiment, since the phonemic piece positions of 
all phonemic piece data are managed, the processing 
in step S11 in Fig. 4 can be simply implemented as the 
step of referring to a table. This can further simplify the 
processing. 

[0051] Note that the present invention may be applied 
to either a system constituted by a plurality of equip- 



ments (e.g., a host computer, an interface device, a 
reader, a printer, and the like), or an apparatus consist- 
ing of a single equipment (e.g., a copying machine, a 
facsimile apparatus, or the like). 
s [0052] The objects of the present invention are also 
achieved by supplying a storage medium, which records 
a program code of a software program that can realize 
the functions of the above-mentioned embodiments to 
the system or apparatus, and reading out and executing 
10 the program code stored in the storage medium by a 
computer (or a CPU or MPU) of the system or appara- 
tus. 

[0053] In this case, the program code itself read out 
from the storage medium realizes the functions of the 
above-mentioned embodiments, and the storage medi- 
um which stores the program code constitutes the 
present-invention. 

[0054] As the storage medium for supplying the pro- 
gram code, for example, a floppy disk, hard disk, optical 
disk, magneto-optical disk, CD-ROM, CD-R, magnetic 
tape, nonvolatile memory card, ROM, and the like may 
be used. 

[0055] The functions of the above-mentioned embod- 
iments may be realized not only by executing the read- 
out program code by the computer but also by some or 
all of actual processing operations executed by an OS 
(operating system) running on the computer on the ba- 
sis of an instruction of the program code. 
[0056] Furthermore, the functions of the above-men- 
tioned embodiments may be realized by some or all of 
actual processing operations executed by a CPU or the 
like arranged in a function extension board or a function 
extension unit, which is inserted in or connected to the 
computer, after the program code read out from the stor- 
age medium is written in a memory of the extension 
board or unit. 

[0057] As many apparently widely different embodi- 
ments of the present invention can be made without de- 
parting from the spirit and scope thereof, it is to be un- 
derstood that the invention is not limited to the specific 
embodiments thereof except as defined in the append- 
ed claims. 

[0058] Further, the program code can be obtained in 
electronic form for example by downloading the code 
over a network such as the internet. Thus in accordance 
with another aspect of the present invention there is pro- 
vided an electrical signal carrying processor imple- 
mentable instructions for controlling a processor to carry 
out the method as hereinbefore described. 



Claims 

1. A speech synthesis apparatus having a database 
for managing phonemic piece data, characterized 
by comprising: 

generating means (103) for generating a sec- 
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data groups quantized by said quantization means, 
for which no corresponding phonemic data is 
present by using an average fundamental frequen- 
cy which is adjacent to the frequency and for which 
5 corresponding phonemic piece data is present. 

8. A speech synthesis apparatus for performing 
speech synthesis by using phonemic piece data 
managed by a database, characterized by compris- 
10 ing: 

storage means (101a) for storing a table for 
managing position information indicating a po- 
sition of phonemic piece data in the database 

?5 in correspondence with a phoneme obtained in 

consideration of a phonemic context made to 
correspond to the phonemic piece data; 
calculation means (103) for acquiring phone- 
mic context information of a phoneme as a syn- 

20 thesis target and fundamental frequencies cor- 

responding thereto and calculating an average 
of acquired fundamental frequencies; 
search means (103) for searching a phoneme 
group corresponding to the phonemic context 

2S information from the table; 

acquisition means (103) for acquiring, from the 
table, position information of phonemic piece 
data corresponding to a predetermined pho- 
neme of the phoneme group searched out by 

30 said search means, on the basis of the average 

of fundamental frequencies calculated by said 
calculation means; and 
changing means for (103) acquiring phonemic 
piece data indicated by the position information 

35 acquired by said acquisition means from the 

database, and changing a prosody of the ac- 
quired phonemic piece data. 



ond phoneme in consideration of a phonemic 
context for a first phoneme as a search target; 
search means (103) for searching said data- 
base for a phonemic piece data corresponding 
to the second phoneme; 
re-search means (103) for generating a third 
phoneme by changing the phonemic context on 
the basis of the search result obtained by said 
search means, and re-searching said database 
for phonemic piece data corresponding to the 
third phoneme; and 

registration means (103) for registering the 
search result obtained by said search means 
or said re-search means in a table in corre- 
spondence with the second or third phoneme. 

2. The apparatus according to claim 1 , wherein said 
registration means comprises 

calculation means for calculating an average 
fundamental frequency of phonemic piece data 
searched out by said search means or said re- 
search means, and 

sorting means for sorting the searched phone- 
mic piece data group on the basis of the aver- 
age fundamental frequency calculated by said 
calculation means, and 
registers the phonemic piece data group and 
the second or third phoneme in correspond- 
ence with each other according to an order in 
which the phonemic piece data group is sorted 
by said sorting means. 

3. The apparatus according to claim 1 , wherein the 
second phoneme is a triphone obtained in consid- 
eration of phonemic contexts of right and left pho- 
nemes of the first phoneme. 

4. The apparatus according to claim 1, wherein the 
third phoneme is a phoneme obtained in consider- 
ation of at least one of phonemic contexts of right 
and left phonemes of the first phoneme. 

5. The apparatus according to claim 1, wherein the 
third phoneme is a phoneme obtained in consider- 
ation of a left phonemic context of the first phoneme 
when the first phoneme is a vowel, and a right pho- 
nemic context of the first phoneme when the first 
phoneme is a consonant. 

6. The apparatus according to claim 2, wherein said 
registration means further comprises quantization 
means for quantizing an average fundamental fre- 
quency of the searched phonemic piece data. 

7. The apparatus according to claim 6, wherein said 
calculation means interpolates a frequency, of av- 
erage fundamental frequencies of phonemic piece 



9. The apparatus according to claim 8, wherein said 
changing means changes the prosody by using a 
pitch synchronous waveform overlap adding meth- 
od. 



10. The apparatus according to claim 8, wherein when 
a fundamental frequency of aphoneme obtained in 
consideration of the phonemic context is quantized, 
said storage means manages the quantized funda- 
mental frequency in the table in correspondence 
with position information indicating a position in the 
database at which phonemic piece data corre- 
sponding to the phoneme is present. 

11. The apparatus according to claim 8, wherein when 
a fundamental frequency of aphoneme obtained in 
consideration of the phonemic context is quantized, 
said calculation means acquires phonemic context 
information of a phoneme as a synthesis target, and 
calculates an average of quantized fundamental 
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frequencies of the phoneme group. 

12. A control method for a speech synthesis apparatus 
having a database for managing phonemic piece 
data, characterized by comprising: 

a generating step (S1) of generating a second 
phoneme in consideration of a phonemic con- 
text for a first phoneme as a search target; 
a search step (S2) of searching said database 
for a phonemic piece data corresponding to the 
second phoneme; 

a re-search step (S3) of generating a third pho- 
neme by changing the phonemic context on the 
basis of the search result obtained in said 
search step, and re-searching said database 
for phonemic piece data corresponding to the 
third phoneme; and 

a registration step (S7) of registering the search 
res u ft obtained in said search step or said re- 
search step in a table in correspondence with 
the second or third phoneme. 

13. The method according to claim 12, wherein said 
registration step comprises 

a calculation step of calculating an average fun- 
damental frequency of phonemic piece data 
searched out in said search step or said re- 
search step, and 

a sorting step of sorting the searched phonemic 
piece data group on the basis of the average 
fundamental frequency calculated in said cal- 
culation step, and 

registering the phonemic piece data group and 
the second or third phoneme in correspond- 
ence with each other according to an order in 
which the phonemic piece data group is sorted 
in said sorting step. 

14. The method according to claim 1 2, wherein the sec- 
ond phoneme is a triphone obtained in considera- 
tion of phonemic contexts of right and left pho- 
nemes of the first phoneme. 

15. The method according to claim 12, wherein the third 
phoneme is a phoneme obtained in consideration 
of at least one of phonemic contexts of right and left 
phonemes of the first phoneme. 

16. The method according to claim 12, wherein the third 
phoneme is a phoneme obtained in consideration 
of a left phonemic context of the first phoneme when 
the first phoneme is a vowel, and a right phonemic 
context of the first phoneme when the first phoneme 
is a consonant. 

17. The method according to claim 13, wherein said 



registration step further comprises a quantization 
step of quantizing an average fundamental frequen- 
cy of the searched phonemic piece data. 

5 18. The method according to claim 17, wherein said cal- 
culation step comprises interpolating a frequency, 
of average fundamental frequencies of phonemic 
piece data groups quantized in said quntization 
step, for which no corresponding phonemic data is 

10 present by using an average fundamental frequen- 
cy which is adjacent to the frequency and for which 
corresponding phonemic piece data is present. 

19. A control method for a speech synthesis apparatus 
75 for performing speech synthesis by using phonemic 

piece data managed by a database, characterized 
by comprising: 

a storage step of storing a table for managing 

20 position information indicating a position of 

phonemic piece data in the database in corre- 
spondence with a phoneme obtained in consid- 
eration of a phonemic context made to corre- 
spond to the phonemic piece data; 

25 a calculation step (S9) of acquiring phonemic 

context information of a phoneme as a synthe- 
sis target-and fundamental frequencies corre- 
sponding thereto and calculating an average of 
acquired fundamental frequencies; 

30 a search step (S10) of searching a phoneme 

group corresponding to the phonemic context 
information from the table; 
an acquisition step (S12) of acquiring, from the 
table, position information of phonemic piece 

35 data corresponding to a predetermined pho- 

neme of the phoneme group searched out in 
the search step, on the basis of the average of 
fundamental frequencies calculated in said cal- 
culation step; and 

40 a changing step (S13) of acquiring phonemic 

piece data indicated by the position information 
acquired in said acquisition step from the data- 
base, and changing a prosody of the acquired 
phonemic piece data. 

AS 

20. The method according to claim 19, wherein said 
changing step comprises changing the prosody by 
using a pitch synchronous waveform overlap add- 
ing method. 

so 

21. The method according to claim 1 9, wherein when a 
fundamental frequency of aphoneme obtained in 
consideration of the phonemic context is quntized, 
said storage step comprises managing the quan- 

55 tized fundamental frequency in the table in corre- 
spondence with position information indicating a 
position in the database at which phonemic piece 
data corresponding to the phoneme is present. 
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22. The method according to claim 1 9, wherein when a 
fundamental frequency of aphoneme obtained in 
consideration of the phonemic context is quantized, 
said calculation step comprises acquiring phonemic 
context information of a phoneme as a synthesis s 
target, and calculating an average of quantized fun- 
damental frequencies of the phoneme. 

23. A computer-readable memory storing program 
codes for controlling a speech synthesis apparatus 10 
having a database for managing phonemic piece 
data, characterized by comprising: 

a program code for the generating step of gen- 
erating a second phoneme in consideration of is 
a phonemic context for a first phoneme as a 
search target; 

a program code for the search step of searching 
said database for a phonemic piece data cor- 
responding to the second phoneme; 20 
a program code for the re-search step of gen- 
erating a third phoneme by changing the pho- 
nemic context on the basis of the search result 
obtained in the search step, and re-searching 
said database for phonemic piece data corre- 25 
sponding to the third phoneme; and 
a program code for the registration step of reg- 
istering the search result obtained in the search 
step or the re-search step in a table in corre- 
spondence with the second or third phoneme. 30 

24. A computer-readable memory storing program 
codes for controlling a speech synthesis apparatus 
for performing speech synthesis by using phonemic 
piece data managed by a database, characterized 35 
by comprising: 



of the average of fundamental frequencies cal- 
culated in the calculation step; and 
a program code for the changing step of acquir- 
ing phonemic piece data indicated by the posi- 
tion information acquired in the acquisition step 
from the database, and changing a prosody of 
the acquired phonemic piece data. 

25. A method of controlling speech synthesis apparatus 
comprising searching a database to find phonemic 
piece data corresponding to a target phoneme, the 
search comprising the steps of: 

generating a triphone representative of the tar- 
get phoneme and its left and right context infor- 
mation; 

searching the database using the triphone as 
target and, if the triphone is not found, gener- 
ating as a substitute target a diphone repre- 
sentative of the target phoneme and one or oth- 
er of the left and right context information, fol- 
lowed by re-searching the database using the 
substitute target. 

26. An electrical signal carrying processor implementa- 
ble instructions for controlling a processor to carry 
out the method of any of claims 12 to 22 and 25. 



20 



25 



a program code for the storage step of storing 
a table for managing position information indi- 
cating a position of phonemic piece data in the 40 
database in correspondence with a phoneme 
obtained in consideration of a phonemic con- 
text made to correspond to the phonemic piece 
data; 

a program code for the calculation step of ac- 45 
quiring phonemic context information of a pho- 
neme as a synthesis target and fundamental 
frequencies corresponding thereto and calcu- 
lating an average of acquired fundamental fre- 
quencies; 50 
a program code for the search step of searching 
a phoneme group corresponding to the phone- 
mic context information from the table; 
a program code for the acquisition step of ac- 
quiring, from the table, position information of 55 
phonemic piece data corresponding to a prede- 
termined phoneme of the phoneme group 
searched out in the search step, on the basis 
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